2025-06-10
“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
“It is that which gives to the view the great number of ideas in the shortest time with the least ink in the smallest space…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
“It is nearly always multivariate…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
“Graphical excellence requires telling the truth about the data…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
Charles Minard’s Napoleon’s March
“[Minard’s classic image] can be described and admired, but there are no compositional principles on how to create that one wonder graphic in a million.””
Edward Tufte, The Visual Display of Quantitative Information, 1983
Instead, Tufte suggests:
We will revisit more of Tufte’s principles throughout the course.
ggplot2ggplot2 is a powerful package for creating data visualizationsgit is a powerful tool for version controlgit is not GitHubgitpandas and numpymatplotlib and seabornggplot2 syntax, since LLMs really can mostly solve technical visualization problemsYou may need to configure your git username and email.
On Windows, you can run this in “Git Bash”. On Mac, you can run this in the terminal.
I recommend these conventions:
work/
├── example_* # Learning examples
├── ps1_* # Problem set 1
├── ps2_* # Problem set 2
├── ps3_* # Problem set 3
├── final_project_* # Final project
├── shared_* # Shared resources
└── data/ # Data directory
Flat Structure:
work/
├── data_prep.qmd
├── analysis.qmd
└── data/Nested Structure:
work/
├── data_prep/
│ └── data_prep.qmd
├── analysis/
│ └── analysis.qmd
└── data/Relative Paths are Tricky!
../../data/file.csv are:
.Rproj Filehere Package../../ countinggitgitexamples/project-example/_quarto.yml for configurationdata/ directory.gitignorework/ using RStudio (File -> New Project -> Existing Directory)I have tested this and RStudio handles the remote repository in the directory one higher up.
Important!
The work/ directory is your personal workspace for everything in this course:
You are responsible for:
This is your space - keep it clean and organized!
Let’s get a file set up to work with Quarto and have data to read from.
_quarto.yml from examples to _quarto.yml in your new projectdata/ directoryWe are now going to create a file called .gitignore to tell git to ignore certain files.
.gitignore (if it doesn’t already exist).gitignoreNever Commit Sensitive Data!
.gitignore tells Git which files to ignore.gitignore Matters.gitignore SetupWith the course-wide .gitiginore repository file, you will see these lines:
work/data/*
work/.Rproj.user/
work/.Rhistory
work/.RData
work/data/*: Keeps all data files in the work directory localwork/.Rproj.user: RStudio temporary fileswork/.Rhistory: Command historywork/.RData: R workspace filesIn RStudio:
example_cars_1_data_prep.qmd in your work/ directoryIn RStudio:
You can remove the editor: visual line – we’re going to try to work with text.
Let’s create the data preparation setup file.
Include this as a setup block:
```{r setup-prep}
#| echo: false
#| message: false
library(dplyr)
library(readr)
library(stringr)
library(ggplot2)
```
And then this to load and prepare the data:
```{r load-data}
# Load and prepare data
mtcars_clean <- mtcars |>
mutate(
car_name = rownames(mtcars),
make = word(car_name, 1), # First word is make
model = str_remove(car_name, paste0(make, " ")), # Rest is model
efficiency = mpg / wt
)
# Save processed data
write_csv(mtcars_clean, "data/mtcars_clean.csv")
```
Render the file to see the results (click the “Render” button above the editor)
We will get two outputs:
data/ directoryIn RStudio:
example_cars_2_analysis.qmd in your work/ directory```{r setup-analysis}
#| echo: false
#| message: false
library(dplyr)
library(readr)
library(ggplot2)
library(forcats)
```
```{r load-processed}
# Load processed data
df <- read_csv("data/mtcars_clean.csv")
df |> head()
```
You have:
Now, let’s set up version control in your project.
In the terminal:
git add .git commit -m "Description of changes"git pushChoose whichever method you’re most comfortable with as both accomplish the same thing!
From here on out, it’s up to you to create the code blocks, such as below:
```{r}
# Code goes here
```
ggplot2ggplot2gg or grammar of graphicsWhat’s wrong with this?
efficiency variableefficiency_by_make <- df |>
group_by(make) |>
summarise(avg_efficiency = mean(efficiency)) |>
mutate(make = fct_reorder(make, avg_efficiency)) |>
ggplot(aes(x = make, y = avg_efficiency)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_minimal() +
theme(panel.grid.major.y = element_blank()) +
labs(
title = "Average Fuel Efficiency by Make",
x = NULL,
y = "Average Efficiency (MPG/1000 lbs)"
)Graduate Summer Institute of Epidemiology and Biostatistics